;;; -*- Syntax: Common-Lisp; Package: (AUTOCLASS CL); Base: 10; Mode: TEXT -*- ;;; File: Autoclass-X:doc;checkpoint.text ;;;————————————————————————-;;; ;;; AUTOCLASS 3.0 Released 5/90 contact: Taylor@pluto.arc.nasa.gov ;;; ;;; by P. Cheeseman, J. Stutz, R. Hanson, W. Taylor ;;; ;;; NASA Ames Research Center, MS 244-17, Moffett Field, CA 94035 ;;; ;;; ;;; ;;; Copyright (C) 1990 Research Institute for Advanced Computer Science. ;;; ;;; All rights reserved. The RIACS Software Policy contains specific ;;; ;;; terms and conditions on the use of this software, and must be ;;; ;;; distributed with any copies. THIS FILE MAY BE REDISTRIBUTED. This ;;; ;;; copyright and notice must be preserved in all copies made of this file.;;; ;;;————————————————————————-;;; ;;; added 6/06/90 for 3.0.2 Checkpointing: With very large databases there is a significant probability of a system crash during any one classification try. Under such circumstances it is advisable to take the time to checkpoint the calculations for possible restart. The code modifications given at the end of this file provide for checkpointing the current state of a classification at the end of each basic convergence cycle. It can either be loaded after the regular AutoClass system is loaded, or substituted for the standard definition of Base-Cycle. This provides a modified version of Base-Cycle that checks for a value of the global variable *checkpoint-file*. So long as *checkpoint-file* is nil, there is no apparent change from the standard Base-Cycle. When *checkpoint-file* is a pathname, Base-Cycle uses Save-Clsf-Seq to save a compressed version of the classification. On Symbolics systems this uses Dump-Object-To-File to make a binary version. This is much quicker than writing a normal ASCII file, but is not human readable. On systems where Dump-Object-To-File is not defined, saved files are written in ASCII. If you choose to write your own version of Dump-Object-To-File, please send us a copy so we can pass it on. Note that checkpointing will slow the search process, noticeably so when writing out in ASCII. With ASCII output, successive file versions supersede the previous ones. To recover the classification after rebooting do: (setf clsf (first (Get-Clsf-Seq *checkpoint-file* :expand t :wts t))) If needed, this will cause the appropriate database and models to be loaded, provided there has been no change in their filenames since the time they were loaded for the checkpointed classification. Protocols: The standard search control function, Autoclass-Search, has no facilities for restarting from a partially converged classification. However it does quickly find a good distribution for the number of classes to start with for any particular combination of initialization and search. We therefore advise that before trying to classify the full database, one apply Autoclass-Search to several small randomly chosen fractions of the database, without using checkpointing. This will give a good measure of the minimum number of classes to expect, and may suggest ways to improve the model. If there is no great chance of a system crash during a full classification, we suggest continuing with Autoclass-Search. Use the same arguments as before, except that :start-J-list should be something like a list of the number of classes in the best six classifications seen so far (i.e. found during the best of the data subset searches.) Assuming you have just completed the initial search, this could be done by: (setf search *) ;; grabbing the results of the initial search. (setf j-list (map 'list #'search-try-j-in (safe-subseq (search-tries search) 0 6))) (autoclass-search :start-j-list j-list .....) ;; With the full data set. If you experience, or expect, a system crash in every few classifications, it would be better to use Find-Best-N-4 to search for good classifications. This is a primitive version of Autoclass-Search which has a restart capability (which means that it can begin again with a checkpointed classification after a crash). Find-Best-N-4's search arguments are similar to those for AutoClass-Search, but it requires an existing classification as it's primary input. Use Generate-Clsf to make an initial classification, and the checkpointed classification for restarts. The primary arguments to Generate-Clsf are the pathnames for the data, header and model files. See the specific functions for information on the omitted arguments, some of which are required. Set up :start-j-list as described above. Start with: (setf *checkpoint-file* (make-pathname .....)) (setf clsf (generate-clsf :n-classes 1 :start-fn 'block-set-clsf ....)) (find-best-n-4 clsf :start-j-list j-list .....) And restart with: (setf clsf (first (Get-Clsf-Seq *checkpoint-file* :expand t :wts t))) (find-best-n-4 clsf :restart t :start-j-list j-list .....) If you find that most classifications crash, you might as well go to a fully manual search. This starts by initializing a classification with a hopefully appropriate number of classes. You then apply one of the search functions (see *try-fn-list*) and keep restarting from the checkpointed version until the search ends naturally. Alternate classifications are rated by the marginal posterior in the log-a<X/H> field [use (clsf- - log - a < X/H > clsf )].Ausefultactic, usedfortheIRASclassificationwithAutoClassII, istomakeanumberofstartsthatareonlyconvergedtoacoarselimitandchoosethebestoftheseforfurtherconvergence.Startwith : (setf*checkpoint - file*(make - pathname.....))(setfclsf (generate - clsf : n - classesN - CLASSES....))( < try - fn > clsf.....)Andrestartwith : (setfclsf (first(Get - Clsf - Seq*checkpoint - file* : expandt : wtst)))( < try - fn > clsf.....)Notethatinthiscaseyouwillhavetomanagethesearchforthebestnumberofclassesyourself, bychoosingN - CLASSESyourselfeachtime.Tryarangeofpossibilities, assuggestedbyprevioussearchesonpartialdatasets, andgraduallyfocusinonthosethatseemtogivethebestresults, asmeasuredby(clsf –log-a<X/H> clsf). ————————————————————————— (defvar *checkpoint-file* nil "When checkpointing is necessary, set this to be the pathname of your checkpoint file. Otherwise it MUST be nil. Note that the file type will be overridden by the AutoClass standard types.") (defun Base-Cycle (clsf &key (stream t) display-wts) "Special checkpointing version of the standard Update-Wts, Update-Parameters, and Update-Approximations cycle." (declare (special *checkpoint-file*)) (UPDATE-WTS clsf) (let ((n-stored (DELETE-NULL-CLASSES clsf))) (when (and display-wts (plusp n-stored)) (format stream " & 3D null classes stored from base-cycle." n-stored))) (UPDATE-PARAMETERS clsf) (UPDATE-APPROXIMATIONS clsf) (if display-wts (display-step clsf stream)) (if *checkpoint-file* (unless (pathnamep *checkpoint-file*) (break "Checkpointing: reset *checkpoint-file* to a pathname or  nil and continue.")) (SAVE-CLSF-SEQ (list clsf) *checkpoint-file* :binary t)) (clsf- - log - a < X/H > clsf ))(if*checkpoint - file*(SAVE - CLSF - SEQ(listclsf )*checkpoint - file* : binaryt))(clsf –log-a<X/H> clsf) ) *** I am confused about the interaction between new Base-Cycle and autoclass-search, find-best-n-4, & <try-fn>: are you saying to use autoclass-search (with old Base-Cycle) to find minimum number of classes, then load new Base-Cycle and run either find-best-n-4 or <try-fn>? – I don't understand the difference between using find-best-n-4 & <try-fn>.